Search CORE

11 research outputs found

Amortising the Cost of Mutation Based Fault Localisation using Statistical Inference

Author: An Gabin
Feldt Robert
Kim Jinhan
Yoo Shin
Publication venue
Publication date: 26/02/2019
Field of study

Mutation analysis can effectively capture the dependency between source code and test results. This has been exploited by Mutation Based Fault Localisation (MBFL) techniques. However, MBFL techniques suffer from the need to expend the high cost of mutation analysis after the observation of failures, which may present a challenge for its practical adoption. We introduce SIMFL (Statistical Inference for Mutation-based Fault Localisation), an MBFL technique that allows users to perform the mutation analysis in advance against an earlier version of the system. SIMFL uses mutants as artificial faults and aims to learn the failure patterns among test cases against different locations of mutations. Once a failure is observed, SIMFL requires either almost no or very small additional cost for analysis, depending on the used inference model. An empirical evaluation of SIMFL using 355 faults in Defects4J shows that SIMFL can successfully localise up to 103 faults at the top, and 152 faults within the top five, on par with state-of-the-art alternatives. The cost of mutation analysis can be further reduced by mutation sampling: SIMFL retains over 80% of its localisation accuracy at the top rank when using only 10% of generated mutants, compared to results obtained without sampling

arXiv.org e-Print Archive

Chalmers Research

Fonte: Finding Bug Inducing Commits from Failures

Author: An Gabin
Hong Jingun
Kim Naryeong
Yoo Shin
Publication venue
Publication date: 13/12/2022
Field of study

A Bug Inducing Commit (BIC) is a commit that introduces a software bug into the codebase. Knowing the relevant BIC for a given bug can provide valuable information for debugging as well as bug triaging. However, existing BIC identification techniques are either too expensive (because they require the failing tests to be executed against previous versions for bisection) or inapplicable at the debugging time (because they require post hoc artefacts such as bug reports or bug fixes). We propose Fonte, an efficient and accurate BIC identification technique that only requires test coverage. Fonte combines Fault Localisation (FL) with BIC identification and ranks commits based on the suspiciousness of the code elements that they modified. Fonte reduces the search space of BICs using failure coverage as well as a filter that detects commits that are merely style changes. Our empirical evaluation using 130 real-world BICs shows that Fonte significantly outperforms state-of-the-art BIC identification techniques based on Information Retrieval as well as neural code embedding models, achieving at least 39% higher MRR. We also report that the ranking scores produced by Fonte can be used to perform weighted bisection, further reducing the cost of BIC identification. Finally, we apply Fonte to a large-scale industry project with over 10M lines of code, and show that it can rank the actual BIC within the top five commits for 87% of the studied real batch-testing failures, and save the BIC inspection cost by 32% on average.Comment: accepted to ICSE'23 (not the final version

arXiv.org e-Print Archive

Learning Test-Mutant Relationship for Accurate Fault Localisation

Author: An Gabin
Feldt Robert
Kim Jinhan
Yoo Shin
Publication venue
Publication date: 04/06/2023
Field of study

Context: Automated fault localisation aims to assist developers in the task of identifying the root cause of the fault by narrowing down the space of likely fault locations. Simulating variants of the faulty program called mutants, several Mutation Based Fault Localisation (MBFL) techniques have been proposed to automatically locate faults. Despite their success, existing MBFL techniques suffer from the cost of performing mutation analysis after the fault is observed. Method: To overcome this shortcoming, we propose a new MBFL technique named SIMFL (Statistical Inference for Mutation-based Fault Localisation). SIMFL localises faults based on the past results of mutation analysis that has been done on the earlier version in the project history, allowing developers to make predictions on the location of incoming faults in a just-in-time manner. Using several statistical inference methods, SIMFL models the relationship between test results of the mutants and their locations, and subsequently infers the location of the current faults. Results: The empirical study on Defects4J dataset shows that SIMFL can localise 113 faults on the first rank out of 224 faults, outperforming other MBFL techniques. Even when SIMFL is trained on the predicted kill matrix, SIMFL can still localise 95 faults on the first rank out of 194 faults. Moreover, removing redundant mutants significantly improves the localisation accuracy of SIMFL by the number of faults localised at the first rank up to 51. Conclusion: This paper proposes a new MBFL technique called SIMFL, which exploits ahead-of-time mutation analysis to localise current faults. SIMFL is not only cost-effective, as it does not need a mutation analysis after the fault is observed, but also capable of localising faults accurately.Comment: Paper accepted for publication at IST. arXiv admin note: substantial text overlap with arXiv:1902.0972

arXiv.org e-Print Archive

PyGGI 2.0: Language independent genetic improvement framework

Author: An Gabin
Binkley Dave
Eén Niklas
Marginean Alexandru
Petke J.
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 12/08/2019
Field of study

PyGGI is a research tool for Genetic Improvement (GI), that is designed to be versatile and easy to use. We present version 2.0 of PyGGI, the main feature of which is an XML-based intermediate program representation. It allows users to easily define GI operators and algorithms that can be reused with multiple target languages. Using the new version of PyGGI, we present two case studies. First, we conduct an Automated Program Repair (APR) experiment with the QuixBugs benchmark, one that contains defective programs in both Python and Java. Second, we replicate an existing work on runtime improvement through program specialisation for the MiniSAT satisfiability solver. PyGGI 2.0 was able to generate a patch for a bug not previously fixed by any APR tool. It was also able to achieve 14% runtime improvement in the case of MiniSAT. The presented results show the applicability and the expressiveness of the new version of PyGGI. A video of the tool demo is at: https://youtu.be/PxRUdlRDS40

Crossref

UCL Discovery

Learning test-mutant relationship for accurate fault localisation

Author: An Gabin
Feldt Robert
Kim Jinhan
Yoo Shin
Publication venue: 'Elsevier BV'
Publication date: 01/01/2023
Field of study

Context: Automated fault localisation aims to assist developers in the task of identifying the root cause of the fault by narrowing down the space of likely fault locations. Simulating variants of the faulty program called mutants, several Mutation Based Fault Localisation (MBFL) techniques have been proposed to automatically locate faults. Despite their success, existing MBFL techniques suffer from the cost of performing mutation analysis after the fault is observed. Method: To overcome this shortcoming, we propose a new MBFL technique named SIMFL (Statistical Inference for Mutation-based Fault Localisation). SIMFL localises faults based on the past results of mutation analysis that has been done on the earlier version in the project history, allowing developers to make predictions on the location of incoming faults in a just-in-time manner. Using several statistical inference methods, SIMFL models the relationship between test results of the mutants and their locations, and subsequently infers the location of the current faults. Results: The empirical study on DEFECTS4J dataset shows that SIMFL can localise 113 faults on the first rank out of 224 faults, outperforming other MBFL techniques. Even when SIMFL is trained on the predicted kill matrix, SIMFL can still localise 95 faults on the first rank out of 194 faults. Moreover, removing redundant mutants significantly improves the localisation accuracy of SIMFL by the number of faults localised at the first rank up to 51. Conclusion: This paper proposes a new MBFL technique called SIMFL, which exploits ahead-of-time mutation analysis to localise current faults. SIMFL is not only cost-effective, as it does not need a mutation analysis after the fault is observed, but also capable of localising faults accurately

Chalmers Research